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Scientific  Progress 

This  grant  enabled  the  development  of  a  new  field  in  computer  design  and  fabrication  based  on  self-assembled  resonance 
energy  transfer  devices.  Through  this  work  we  have  also  discovered  new  ways  to  apply  these  new  kinds  of  devices.  We  also 
were  the  first  to  develop  a  thermo-mechanical  model  of  DNA  that  reconciles  observed  UV  absorption  changes  with  the  actual 
structure  of  complex  DNA  nanostructures.  The  following  are  the  major  themes  of  our  other  accomplishments:  (1 )  proof  of 
concept  high-density  optical  memory,  (2)  a  physical  key  based  information  assurance  system,  (3)  a  network-on-chip  computer 
architecture  based  on  RET,  and  (4)  a  high  on/off  ratio  RET  switch  for  all-optical  computation  with  nanoscale  device  density. 
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Final  Report 


This  report  provides  summary  overviews  of  the  progress  completed  through  ARO  YIP  grant  54696-EL- 
PCS  from  the  period  10/1/2009  to  9/30/2014  for  the  eight  sub-project  thrusts.  The  document  is 
organized  into  sections  that  describe  each  sub-project  with  an  emphasis  on  highlighting  important 
results  and  findings.  Each  sub-project  is  described  in  complete  detail  in  the  papers  identified  in  the 
bibliography. 

Project  1.  Design  Automation  for  Resonance  Energy  Transfer  Networks 

Resonance  energy  transfer  (RET)  circuits  are  networks  of  photo-active  molecules  that  can  implement 
arbitrary  logic  functions.  The  nanoscale  size  of  these  structures  can  bring  high-density  computation  to 
new  domains,  e.g.,  in  vivo  sensing  and  computation.  A  key  challenge  in  the  design  of  a  RET  network  is  to 
find,  among  a  huge  set  of  configurations  (i.e.,  design  space),  the  optimum  choice  and  arrangement  of 
molecules  on  a  nanostructure.  The  prohibitively  large  size  of  the  design  space  makes  it  impractical  to 
evaluate  every  possible  configuration,  motivating  the  need  for  design-space  pruning  to  be  integrated 
into  the  design  flow.  To  this  end,  this  project  developed  a  computer-aided  design  framework,  called 
RETLab,  that  enables  structured  pruning  of  the  design  space  to  extract  a  sufficiently  small  subset,  which 
is  fully  evaluated  and  ranked  based  on  user-defined  metrics  to  yield  the  best  configuration.  More 
importantly,  we  have  developed  a  new  RET-simulation  algorithm,  which  is  several  orders  of  magnitude 
(e.g.,  for  a  4-node  network,  one  million  times)  faster  than  conventional  Monte-Carlo-based  simulation 
(MCS).  This  speedup  in  configuration  evaluation  enabled  a  significantly  more  extensive  design-space 
exploration  with  fewer  and  less  constrained  heuristics,  compared  to  existing  RET-network  design 
methods  which  were  ad-hoc  and  rely  on  MCS  for  configuration  evaluation. 

We  found  that  the  actual  functionality  of  a  fabricated  RET  network  usually  deviated  from  its  desired 
functionality  due  mainly  to  two  reasons:  1-  Undesired  modified  inter-chromophore  distances  imposed 
by  the  underlying  nanostructure,  and  2-  Undesired  RET  properties  dictated  by  the  molecular  structure  of 
chromophores.  We  described  the  desired  functionality  by  an  exciton-flow  graph  (EFG)  which  served  as 
the  ideal  model.  Therefore,  the  automation  problem  we  discovered  was  to  find  the  configuration  that 
yields  the  best  behavioral  match  to  the  EFG. 

Existing  methods  for  RET  network  design  were  all  ad  hoc  and  limited  to  a  particular  functionality.  Aside 
from  lack  of  generalizability,  due  to  their  low  throughput,  these  methods  are  incapable  of  efficiently 
exploring  large  design  spaces  within  a  limited  design  time. 

We  developed  a  generic  RET-network  design  framework  which  has  a  higher  throughput  compared  to  ad 
hoc  existing  methods.  The  higher  throughput  of  our  design  flow  was  enabled  by  avoiding  unnecessary 
simulations,  as  well  as  employing  a  novel  simulation  algorithm  which,  in  addition  to  being  highly  precise, 
is  several  orders  of  magnitude  faster  than  conventional  simulation  methods.  This  higher  throughput 
enabled  a  more  extensive  exploration  of  larger  design  spaces,  compared  to  the  existing  methods, 
making  RETLab  the  first  efficient  framework  for  optimizing  RET  circuit  components  and  devices. 


Project  2.  Optical  Storage  Based  on  Self-assembled  DNA  Networks 

Building  on  the  potential  for  DNA  to  achieve  vast  enhancements  in  storage  density,  this  project 
demonstrated  a  novel  retrieval  technique  which  we  call  polychromatic  address  multiplexing  (PAM)  to 
access  multiple  bits  stored  in  the  same  cell.  By  exploiting  nanoscale  fluorescence-based  storage 
elements,  PAM  enables  storage  of  hundreds  of  bits  in  a  single  cell.  In  this  fashion,  higher  storage 
densities  beyond  the  diffraction  limit  are  achieved  with  this  technique,  which  can  also  be  incorporated 
into  conventional  optical-storage  technologies. 

We  demonstrated  the  information  retrieval  technique  and  showed  that  it  has  the  potential  to  enable  a 
as  high  as  5500X  increase  in  the  areal  storage  density  of  current  optical  discs  beyond  the  diffraction 
limit.  PAM  exploits  acceptor  saturation  in  precisely-assembled  nanoscale  FRET-based  structures  to 
induce  a  fluorescence  increase  which  is  exclusively  caused  by  the  addressed  structures  while  other 
structures  remain  inactive.  Using  commercially  available  dyes,  we  synthesized  two  kinds  of  storage 
elements  to  experimentally  demonstrate,  as  a  proof-of-concept,  storage  cell  capacities  of  two,  three, 
and  four  bits.  We  also  simulated  6-color  storage-elements  based  on  PAM-optimized  synthetic  dyes  and 
the  channel  capacities  were  shown  to  be  at  least  six  bits.  The  exponential  growth  of  the  address  space 
enabled  by  PAM  makes  it  an  efficient  technique  for  many  data-multiplexing  applications  including,  but 
not  limited  to,  optical  storage  media. 

Project  3.  Self-assembling  Circuit  Defect  Modeling 

The  self-assembly  of  nanoelectronic  devices  provide  an  opportunity  to  achieve  unprecedented  density 
and  manufacturing  scale  in  the  post-Moore's  Law  era.  Bottom-up  DNA  self-assembly  has  emerged  as  a 
promising  technique  towards  achieving  this  vision  and  it  has  been  used  to  demonstrate  precise 
patterning  and  functionalisation  at  resolutions  below  20  nm.  However,  a  lack  of  understanding  of 
fabrication  defects  and  their  impact  on  circuit  behaviour  are  major  obstacles  to  the  eventual  application 
of  these  substrates  to  circuit  design.  This  project  developed  a  classification  of  defects  observed  in  our 
experimental  work  on  self-assembled  nanostructures.  Atomic  force  microscope  (AFM)  images  were  used 
to  study  these  defects  and  determine  their  relative  frequencies.  We  then  connected  these  defects  to 
fault  models  and  predicted  their  likely  impact  on  the  behavior  of  logic  gates.  Based  on  simulation 
program  with  integrated  circuit  emphasis  simulation  data  for  the  proposed  layouts,  we  concluded  that 
there  is  a  predictive  connection  between  faulty  logic  behavior  and  physical  defects  for  future  DNA  self- 
assembled  nanoelectronics.  This  work  will  be  useful  in  predicting  the  potential  success  of  defect- 
tolerance  techniques  for  DNA  self-assembled  nanoelectronic  substrates. 

We  characterized  the  structural  defects  that  occur  during  bottom-up  DNA  self-assembly.  We  developed 
a  classification  of  these  defects  using  experimental  data  from  AFM  images  of  self-assembled  DNA 
nanostructures.  These  images  have  been  used  to  study  defects  in  detail  and  to  determine  their  relative 
frequencies  of  occurrence.  We  have  classified  defects  as  being  either  structural  defects  that  can 
currently  be  observed,  or  anticipated  defects  that  are  expected  to  occur  in  future  nanostructures.  We 
related  these  defects  to  circuit-level  fault  models  and  predicted  their  likely  impact  on  the  behavior  of 
logic  gates.  We  then  developed  a  set  of  fault  models  for  CMOS  and  carbon-nanotube  (CNT)  layouts  of  an 
inverter,  a  NAND  gate,  and  a  NOR  gate.  These  gates  were  simulated  using  the  SPICE3f5  software  with 


several  fault  injections  to  compare  the  outputs  of  the  two  technologies.  Missing  functionalisation  sites 
and  tiles  make  up  approximately  65%  of  the  known  physical  defects.  These  defects  tend  to  result  in 
single-stuck  line  faults,  assuming  that  this  occurrence  is  common  for  anticipated  defects  too.  Non¬ 
specific  functionalisation  defects  result  in  potential  delay  faults.  However,  these  faults  are  not 
observable  at  the  simulated  switching  frequencies  for  the  CNT  models.  Future  work  will  continue  to 
explore  the  potential  connections  between  physical  defects  and  fault  models  for  more  complex  circuits 
and  complex  faults  such  as  bridging  faults. 


Project  4.  Integration  of  Sensing  and  Computation  at  the  Nanoscale 

This  project  explored  the  architectural  implications  of  integrating  computation  and  molecular  probes  to 
form  nanoscale  sensor  processors  (nSP).  We  showed  how  nSPs  may  enable  new  computing  domains  and 
automate  tasks  that  currently  require  expert  scientific  training  and  costly  equipment.  This  new 
application  domain  severely  constrains  nSP  size,  which  significantly  impacts  the  architectural  design 
space.  In  this  context,  we  explored  nSP  architectures  and  developed  an  nSP  design  that  includes  a 
simple  accumulator-based  ISA,  sensors,  limited  memory  and  communication  transceivers.  To  reduce  the 
application  memory  footprint,  we  introduced  the  concept  of  instruction-fused  sensing.  We  used 
simulation  and  analytical  models  to  evaluate  nSP  designs  executing  a  representative  set  of  target 
applications.  Furthermore,  we  designed  a  candidate  nSP  technology  based  on  optical  Resonance  Energy 
Transfer  (RET)  logic  that  enables  the  small  size  required  by  the  application  domain;  our  smallest  design  is 
about  the  size  of  the  largest  known  virus.  We  also  showed  laboratory  results  that  demonstrate  initial 
steps  towards  a  simple  prototype. 

Two  driving  forces  on  computer  architecture  are  application  requirements  and  technology  change.  The 
combination  of  important  problems  in  the  life  sciences  and  advances  in  material  science  are  exposing  a 
new  computational  domain:  biological  scale  integrated  sensing  and  processing.  The  ability  to  utilize 
programmable  devices  at  biological  scales  may  enable  life  scientists  to  perform  hypothesis  testing 
previously  thought  impossible.  This  domain  presents  new  challenges  to  computer  architects  due  to  the 
extreme  size  constraints:  a  device  must  be  capable  of  diffusing  through  small  volumes  while  still  meeting 
application  requirements.  This  project  developed  the  first-of-its-kind  architecture  for  nanoscale  sensing 
and  processing.  We  analyzed  the  application  characteristics  (e.g.,  long  time  scales  and  common 
operations)  to  design  a  multicycle  accumulator-based  architecture.  A  novel  aspect  of  this  architecture  is 
the  use  of  instruction-fused  sensing  that  exploits  the  unified  use  of  nanoscale  devices  for  both  sensing 
and  logic  design  to  allow  sensors  to  directly  modify  logic  values  (i.e.,  instruction  opcode  bits).  We 
implemented  several  representative  applications  that  execute  on  the  architecture  and  demonstrated 
capabilities  (e.g.,  sensing  based  on  complex  logic)  beyond  those  achievable  with  current  simple 
biological  sensors.  This  work  represented  our  first  steps  toward  developing  biological  scale  computing 
systems. 


Project  5.  Label-free  Sensing  with  Self-assembling  Nanoscale  Systems 

The  self-assembly  of  molecularly  precise  nanostructures  is  widely  expected  to  form  the  basis  of  future 
high-speed  integrated  circuits,  but  the  technologies  suitable  for  such  circuits  are  not  well  understood.  In 
this  project,  DNA  self-assembly  was  used  to  create  molecular  logic  circuits  that  can  selectively  identify 
specific  biomolecules  in  solution  by  encoding  the  optical  response  of  near-field  coupled  arrangements  of 
chromophores.  The  resulting  circuits  can  detect  label-free,  femtomole  quantities  of  multiple  proteins, 
DNA  oligomers,  and  small  fragments  of  RNA  in  solution  via  ensemble  optical  measurements.  This 
method,  which  is  capable  of  creating  multiple  logic-gate-sensor  pairs  on  a  2X80X80-nm  DNA  grid,  was  a 
first  step  toward  more  sophisticated  nanoscale  logic  circuits  capable  of  interfacing  computers  with 
biological  processes. 

The  methods  developed  in  this  project  demonstrated  the  feasibility  of  sensing  biomolecules  with  RET 
logic.  The  method  takes  advantage  of  DNA  self-assembly  to  build  nanoscale  grids  with  integrated  RET 
circuits  designed  to  operate  as  digital  multiplexers.  However,  an  important  aspect  of  this  work  was  to 
demonstrate  viable  pathways  towards  building  larger  RET  circuits  suitable  for  high-throughput  sensing 
of  many  analytes,  simultaneously.  To  do  this,  we  used  Boolean  logic  to  create  a  multiplexer  by  designing 
logic  terms  that  enable  a  distinct  output  for  each  unique  sensor.  Upon  excitation,  or  addressing,  we 
observed  the  output  from  one  and  only  one  kind  of  sensor.  In  this  way,  it  is  possible  to  combine  many 
such  sensors  into  a  single  monolith,  smaller  than  the  diffraction  limit,  as  long  as  each  address  uniquely 
identifies  a  single  kind  of  sensor.  The  logprime-output-encoding  technique  we  developed  trades  signal- 
to-noise  for  address  space  by  requiring  a  greater  degree  of  significance  in  any  measured  response. 

Thus,  single-photon-counting  techniques,  with  an  ability  to  detect  attomoles  of  chromophores,  dilute 
gates  where  the  analyte  is  in  excess,  and  a  library  of  only  six  distinct  chromophores  (i.e.,  five  inputs 

and  one  output)  will  enable  sensing  of  over  24  analytes  on  a  single  4X4  DNA  grid.  After  accounting  for 
yield  (35%)  and  a  diffraction-limited  spot  size  of  approximately  700nm2  (e.g.,  using  600-nm  fluorescent 
output)  the  overall  sensor  density  could  be  as  high  as  1013m"2  unique  analytes  which  currently  exceeds 
the  density  of  next-generation  sensor  arrays  (e.g.,  gene  chips  with  500-nm-diameter  probe  spots)  by  an 
order  of  magnitude. 


Project  6.  Molecular-scale  Network-on-chip  (NoC) 

In  this  project,  we  explored  the  use  of  emerging  molecular  scale  devices  to  construct  nanophotonic 
networks  called  Molecular-scale  Network-on-Chip  (mNoC).  We  leveraged  quantum  dot  LEDs,  which 
provide  electrical  to  optical  signal  modulation,  and  chromophores,  which  provide  optical  signal  filtering 
for  receivers.  These  devices  replace  the  ring  resonators  and  the  external  laser  source  used  in 
contemporary  nanophotonic  NoCs.  We  developed  different  crossbar  structures  such  as  Single  Writer 
Multiple  Reader  (SWMR),  etc.  We  also  studied  the  implications  of  the  new  mNoC  crossbar  on  overall 
system  design.  An  mNoC  SWMR  crossbar  can  scale  up  to  radix  256  and  our  preliminary  evaluation  shows 


that  it  reduces  over  50%  average  packet  latency  and  40%  power  consumption  compared  with  ring-based 
alternative. 

The  chromophores  replace  the  ring  resonators  and  the  quantum  dot  LED  replaces  the  external  laser 
source  used  in  current  nanophotonic  NoCs.  We  also  studied  SWMR,  MWSR,  and  MWMR  bus-based 
crossbar  mNoC  and  show  that  without  limitations  of  current  nanophotonic  networks,  an  SWMR  mNoC 
crossbar  can  easily  scale  to  a  radix-256  crossbar,  larger  than  any  existing  NoC.  We  evaluated  the  SWMR 
mNoC  with  both  synthetic  benchmarks  and  a  few  PARSEC  benchmarks.  The  synthetic  benchmarks 
simulation  results  showed  that  compared  with  rNoC  and  eMesh,  mNoC  has  greatly  reduced  average 
latency  cycles  (half  atleast)  and  has  higher  tolerance  for  network  traffic.  mNoC  also  exhibits 
performance  improvement  over  rNoC  and  eMesh  with  PARSEC  benchmarks.  Furthermore,  mNoC  trades 
static  power  for  dynamic  power  and  greatly  reduces  power  consumption.  An  mNoC  can  achieve  75% 
reduction  in  power  for  a  64  X  64  crossbar  compared  to  similar  ring  resonator  based  designs. 

Additionally,  an  mNoC  can  scale  to  a  256  X  256  crossbar  with  40%  of  power  reduction.  A  large  single 
crossbar  allows  for  the  possibility  of  high  radix  routers  and  efficient  broadcast  based  directory  protocols 
which  are  immediately  useful  in  modern  computer  processor  designs. 

Project  7.  Optical  Logic  Elements 

Optical  nanoscale  computing  is  one  promising  alternative  to  the  CMOS  process.  For  this  project  we 
explored  the  application  of  Resonance  Energy  Transfer  (RET)  logic  to  common  digital  circuits.  We 
proposed  an  Optical  Logic  Element  (OLE)  as  a  basic  unit  from  which  larger  systems  can  be  built.  An  OLE 
is  a  layered  structure  that  works  similar  to  a  lookup  table  but  instead  uses  wavelength  division 
multiplexing  for  its  inputs  and  output.  Waveguides  provide  a  convenient  mechanism  to  connect  multiple 
OLEs  into  large  circuits.  We  built  a  SPICE  model  from  first  principles  for  each  component  to  estimate  the 
timing  and  power  behavior  of  the  OLE  system.  We  analyzed  various  logic  circuits  and  the  simulation 
results  show  that  the  components  are  theoretically  correct  and  that  the  models  faithfully  reproduce  the 
fundamental  phenomena;  the  power  delay  product  of  OLE  systems  is  at  least  2.5x  less  than  the  14  nm 
CMOS  technology  node  with  100x  better  density. 

Unlike  traditional  CMOS,  an  OLE  uses  wavelength  division  multiplexing  for  its  inputs  and  output. 
Conventional  optical  fibers  are  used  in  a  platform  for  studying  the  OLE  structure  and  in  the  future  will 
serve  as  wires  to  interconnect  multiple  OLEs  into  larger  systems.  To  better  understand  the 
characteristics  of  the  system  and  to  be  useful  as  a  guide  for  future  experimental  demonstration,  a  SPICE 
model  for  each  component  including  a  mapped  power  model  was  built  from  first  principles  and  verified 
with  experimental  results.  Various  combinational  and  sequential  circuits  were  designed  and  simulated 
to  estimate  their  timing  and  power  behavior.  The  simulation  results  suggest  that  the  propagation  delay, 
rise  and  fall  times  in  this  system  are  much  faster  than  the  22  nm  CMOS  technology  node  and  that  the 
system  is  at  least  2.5x  more  energy  efficient  as  well.  Moreover,  the  area  density  of  the  OLE-based 
technology  is  at  least  two  orders  of  magnitude  better  than  the  state-of-the-art  CMOS  technology. 

Future  work  will  focus  on  the  experimental  demonstration  of  integrated  OLEs  and  the  study  of  various 
methods  to  improve  the  longevity  of  the  chromophores  within.  Beyond  molecular  techniques  to  solve 


these  problems,  we  will  also  investigate  various  fault-tolerant  circuit  design  approaches  as  they  apply  to 
building  larger  systems  from  OLEs. 

Project  8.  Routing  Algorithms  for  Irregular  Self-assembled  Networks 

The  integration  of  novel  nanotechnologies  onto  silicon  platforms  is  likely  to  increase  fabrication 
defectscompared  with  traditional  CMOS  technologies.  Furthermore,  the  number  of  nodes  connected 
with  these  networks  makes  acquiring  a  global  defect  map  impractical.  As  a  result,  on-chip  networks  will 
provide  defect  tolerance  by  self-organizing  into  irregular  topologies.  In  this  scenario,  simple  static 
routing  algorithms  based  on  regular  physical  topologies,  such  as  meshes,  will  be  inadequate. 

Additionally,  previous  routing  approaches  for  irregular  networks  assume  abundant  resources  and  do  not 
apply  to  this  domain  of  resource-constrained  self-organizing  nano-scale  networks.  Consequently,  routing 
algorithms  that  work  in  irregular  networks  with  limited  resources  are  needed.  In  this  project,  we 
explored  routing  for  self-organizing  nano-scale  irregular  networks  in  the  context  of  a  Self-Organizing 
SIMD  Architecture  (SOSA).  Our  approach  traded  configuration  time  and  a  small  amount  of  storage  for 
reduced  communication  latency.  We  augmented  an  Euler  path-based  routing  technique  for  trees  to 
generate  static  shortest  paths  between  certain  pairs  of  nodes  while  remaining  deadlock  free. 

Simulations  of  several  applications  executing  on  SOSA  showed  the  routing  algorithm  can  reduce 
execution  time  by  8%  to  30%. 

Moreover,  this  project  showed  that  even  if  there  are  up  to  30%  defective  nodes  or  transceivers  in  the 
networks,  we  can  still  obtain  a  similar  reduction  in  total  execution  time.  Although  our  analysis  is  specific 
to  one  system,  we  believe  that  the  general  approach  of  providing  optimized  routing  between  specific 
pairs  of  nodes  is  applicable  to  a  broader  set  of  systems,  particularly  those  that  implement  logical 
hierarchical  ring  topologies  on  top  of  irregular  physical  topologies. 
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